Google is ushering in a new era of AI with a series of updates to the Gemini family of models. Gemini 1.0, the first native multimodal model, launched in December in three sizes (Ultra, Pro and Nano), was soon followed by the 1.5 Pro version with improved performance and an expanded context window of 1 million tokens.
In light of user feedback, Google introduced Gemini 1.5 Flash to address the need for lower latency and cost of service. Lighter than 1.5 Pro, it is optimized for speed and efficiency and is ideal for high-volume, high-frequency tasks. Offering a 1 million token extended context window, 1.5 Flash excels at tasks such as summarization, chat applications, image and video captioning, and extracting data from long documents and tables.
Google has also significantly improved 1.5 Pro, the best model for overall performance. Expanded to 2 million tokens, the context window offers improved performance in features such as code generation, logical reasoning and planning, multi-round speech, and audio and video understanding, supported by data and algorithmic improvements. 1.5 Pro can now follow more complex and nuanced instructions, including product-level behavior specifiers such as role, format and style.
And Gemini Nano goes beyond text inputs and can process images as a network. Starting with Pixel phones, apps that use Gemini Nano with Multimodality will be able to understand the world the way humans do.
To benefit humanity, Google DeepMind is developing universal AI agents that can help in everyday life with Project Astra. Astra aims to develop AI agents that can understand and act on context in the same way humans understand and react to the complex world. Designed to be proactive, approachable and personalized assistants, these agents will be able to interact with users naturally and without delay.
Astra is designed to process and remember video and speech input. Building on the Gemini model and other task-specific models, the agents process information faster by continuously encoding video frames, combining video and speech input into an event timeline, and caching that information for efficient recall.
Google is also continuing to develop Gemma, its family of open models. Gemma 2, the next generation of open models for responsible AI innovation, will feature a new architecture for breakthrough performance and efficiency and will be available in new sizes.
Google is continuously evolving the Gemini family of models to shape the future of AI, giving users access to smarter and more useful tools in their everyday lives.
Google Gemini is getting personalized!
Google continues to develop Gemini, its personal AI assistant. Designed as a conversational, intuitive and helpful assistant, Gemini helps you tackle complex tasks and take action on your behalf. Available in the app or through the web experience, Gemini is constantly updated.
Analyze documents with the world’s longest context window
Google is offering Gemini Advanced subscribers its newest model, Gemini 1.5 Pro. The world’s consumer chatbot with the longest context window, Gemini 1.5 Pro offers an extended context window starting at 1 million tokens. This means that Gemini Advanced can make sense of multiple large documents totaling up to 1,500 pages or summarize 100 emails. Soon, it will be able to process an hour of video content or more than 30,000 lines of codebase.
To take advantage of this expanded context window, you can upload your files to Gemini Advanced via Google Drive or directly from your device. Now you can quickly get answers and insights on dense documents, such as finding the details of the pet policy in your lease or comparing the key arguments of multiple lengthy research papers. Soon, Gemini Advanced will act like a data analyst, uncovering insights and creating custom visualizations and charts on the fly from uploaded data files such as spreadsheets.
More natural conversations with Gemini Live
Google is introducing new ways to interact with Gemini in a more natural way. With Gemini in Google Messages, you can now chat with Gemini in the same app where you text with friends.
In the coming months, a new mobile chat experience, Live, will be available for Gemini Advanced subscribers. This feature uses Google’s most advanced speech technology to make talking to Gemini more intuitive. With Gemini Live, you can talk to Gemini and choose from a variety of natural voices for it to respond to. You can speak at your own pace, just like in any conversation, or interrupt in the middle of a response with clarifying questions.
Creating complex plans just got easier
Travel planning often takes more time than the trip itself. Gemini Advanced’s new planning experience goes beyond showing a list of suggested activities and creates a customized itinerary for you.
For example, you might tell Gemini, “My family and I are going to Miami for Labor Day. My son likes art and my husband wants fresh seafood. Can you get my flight and hotel information from Gmail and help me plan the weekend?”
This request requires Gemini to do more than just provide publicly available information like other chatbots. Gemini takes into account your flight timing, food preferences and information about local museums, while also understanding where each stop is and how long it will take to travel between each activity. It pulls your flight information from Gmail, taps Google Maps for restaurant and museum recommendations near your hotel, and uses Search to suggest other activities to fill the rest of your day, like a walking tour of the Design District or beach time. It synthesizes all this information for you and creates a personal, customized itinerary that meets all your wishes. If you make changes or add more details, the itinerary is automatically updated.
Personalize Gemini with Gems
For an even more personalized experience, Gemini Advanced subscribers will soon be able to create Gems. A Gem is a customized version of Gemini. You can create any Gem you can imagine: a gym buddy, sous chef, coding partner or creative writing guide.
Setting up your Gem is easy too. Simply define what you want your Gem to do and how you want it to respond (for example, “You are my running coach, give me a daily running plan and be positive, upbeat and motivating”). Gemini will take these instructions and build on them in one click to create a Gem that fulfills your wishes.
Connect with more Google apps
Last year, Google brought Extensions directly to Gemini, allowing you to do more with the Google apps and services you already use. Right now, Google apps like the YouTube Music Extension are being integrated into Gemini.
Soon more Google tools like Google Calendar, Tasks and Keep will be connected to Gemini. This means you can take a photo of your child’s school curriculum and ask Gemini to create a calendar entry for each assignment.
With these updates, Google is making Gemini a personal and customizable AI assistant that is more responsive to users’ needs.
{{user}} {{datetime}}
{{text}}